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Abstract 

Complex systems can be characterized by classes of equivalency of their elements defined accord- 
ing to system specific rules. We propose a generalized preferential attachment model to describe 
the class size distribution. The model postulates preferential growth of the existing classes and the 
steady influx of new classes. We investigate how the distribution depends on the initial conditions 
and changes from a pure exponential form for zero influx of new classes to a power law with an 
exponential cutoff form when the influx of new classes is substantial. We apply the model to study 
the growth dynamics of pharmaceutical industry. 
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Many diverse systems of physics, economics, and biology , share in their 

growth dynamics two basic similarities: (i) The system does not have a steady state and 
is growing, (ii) Basic units are born and they agglomerate to form classes. Classes grow 
in size preferentially depending on the existing size. In the context of economic systems, 
units are products, and the classes are firms. In social systems units are human beings, and 
the classes are cities. In biological systems units can be bacteria, and the classes are the 
bacterial colonies. 

The probability distribution function p(k) of the class size k of the systems mentioned 
above share a universal behavior vik) ~ k~ T with tk2 

nana 

. Other possible values of 

t are discussed and reported in p|. Also, for most of the systems p(k) has an exponential 



cutoff which is o 
models 



ten assumed to be a finite size effect of the databases analyzed. Several 
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explain r ~ 2 but none explains the exponential cutoff of p(k). 
Moreover, these models describing p(k) ~ k~ T are not suitable to describe simultaneously 
systems for which p(k) ~ exp(— / ~fk). Here we present a model with simple set of rules to 
describe p(k) for the entire range of k, i.e., power law with an exponential cutoff. We show 
that the exponential cutoff of the power law is not due to finite size but an effect of the initial 
conditions from which the system starts to evolve. We also show that the functional form 
of p(k) depends on the initial conditions of our model and changes from a pure exponential 
to a pure power law (with r = 2) via a power law with an exponential cutoff. We justify 
our model b y em pirical analysis of a recently constructed pharmaceutical industry database 
(PHID) QQ. 

We now present a model, which has the following rules: 

1. At time t = there exists N classes, each with a single unit. 

2. At each simulation step: 

• (a) With probability b (0 < b < 1) a class with a single unit is born. 

• (b) With probability A (0 < A < 1) a randomly selected class grows one unit in 
size. The selection of the class that grows is done with probability proportional 
to the number of units it already has ["preferential attachment"]. 

• (c) With probability \i (0 < \i < l,p, < A) a randomly selected class shrinks 
one unit in size. The selection of the class that shrinks is done with probability 
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proportional to the number of units it already has ["preferential detachment"]. 

In the continuum limit the proposed growth mechanism gives rise to a master equation of 
p(k, ti, t) which is the probability, for a class i born at simulation step ti, to have k units at 
step t:. 

dt g(t) 

(k + 1) k 

g(t) g(t) 

where g(t) = iV+(A — fi+b)t is the total number of units at simulation step t and p(l, ti, tA 



1. Equation is the generalization of the master equation of birth and death processes 
The analytical solution of Eq. (0) is given by 

n b r l 

*) = WTbt p{k > °' t} ' + iv + 6t X dtlP{k ' tl ' t} (2) 



where the functional form of p(k,ti,t) is given in |8(. The lengthy derivation of the full 
solution of eq. ^ which is a power law(the second term of eq. |2J) with an exponential cutoff 
(the first term of eq. will be presented elsewhere, here we present simulation results. 
First we discuss two limiting solutions of Eq. ((TJ. 

• Case i : No new classes are born (6 = 0). The growth of the system is solely due to 
the preferential attachment of new units to the pre-existing iV classes. In this case 

(FigEQi) 

p(k) ~ e (3) 

This limiting case can be considered as one of the initial condition of the model where 
birth or death of classes are not allowed. We observe that this initial condition results 
in a pure exponential distribution of the number of units inside classes. 

• Case ii : At t = 0, iV = 0, and new classes are born with probability 6^0. In this 
case, for large times p(k) is a pure power law 

p(Jfe) ~ k~\ t = 2. (4) 

This limiting case can be considered as another different initial condition of the model 
where birth or death of classes are allowed starting from = classes. We observe 
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that this initial condition results in a pure power law distribution of the number of 
units inside classes. 

This case is identical to the Simon model and can be understood by the following 
arguments. From case (i) we know that when the number of classes remains constant, 
p(k) decays exponentially with k. The power law of case (ii) is the effect of superposi- 
tion of many exponentials with different decay constants, each resulting from classes 
born at different times (FigHk). 

We next present a mean field interpretation of the result r « 2. At any moment to the 
number of units in the already-existing classes is g(t ). Suppose a new class consisting of 
one unit is created at time to- According to rules 2b, 2c, the growth rate is proportional 
to l/g(to). Neglecting the effect of the influx of new classes on g(to), the average size k 
of this class born at to is proportional to l/g(to). So the classes which were born at times 
t > to remain smaller than the classes born earlier. If we sort the classes according to their 
size, the rank R(k) of a class is proportional to the time of its creation R(k) cx to- Thus 
k »j l/g(t ) ~ 1/to ~ l/R(to) and we arrive to the standard formulation of the Zipf's law j^J 
according to which the size of a class k is inversely proportional to its rank. If we take into 
account the decrease of the growth rate with the influx of new classes, one can show after 
some algebra k ~ _R~( A ~A t )/( A ~A t + fe ) ) which includes k ~ as a limiting case for 6^0. Since 
R(k) is the number of classes whose size is larger than k, we can write in the continuum 
limit R(k) ~ J fc °° p(k)dk and hence p(k) ~ k~ 2 ~ b ^ x ^^ . 

The full solution of Eq. (P), a power law with an exponential cutoff, can be interpreted 
using the following arguments. We start with N classes which are colored red, and let the 
newly born classes be colored blue. Due to the preferential attachment rule, the red classes 
remain on average larger than the blue classes. Thus for large k, p(k) is governed by the 
exponential distribution of the red classes (Case i) while for small k, p(k) is governed by the 
power law distribution of the blue classes (Case ii) (Fig. \Dp). 

Now we apply this model to describe the statistical properties of growth dynamics of 
business firms in pharmaceutical industry. PHID records quarterly sales figures of 48 819 
pharmaceutical products commercialized in the European Union and North America from 
September 1991 to June 2001. The products in PHID can be classified in five different 
hierarchal levels A, B, C, D, and E (Fig. |2J) Each level has a different number of 
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classes, and different initial conditions (Table H]). 

We observe that there are positive correlations between the number of units (products) 
appearing or disappearing per year and the number of units in the classes at a particular 
hierarchal level (Table IH])- This empirical observation supports preferential birth or death 
mechanism (rules 2b, 2c) used in our model. 

For levels A and B where the number of classes did not change we obtain an exponential 
distribution (Figs. EK, Et>) as predicted by limiting Case i of the model. For levels C and 
D a weak departure from the exponential functional form [Figs. Efc> Eli] is due to the slight 
growth in the number of classes. 

The full solution predicted by our model, i.e., the initial power law followed by the 
exponential decay of p(k) is observed empirically for level E (Fig. 0J). For level E we observe 
a power law with r = 1.97 for k < 200, and an exponential cutoff for k > 200. From 
the discussion above with red and blue classes we may infer that the exponential part of 
p(k) arises from pre-existing firms, while the power law part of p(k) represents the young 
firms that enter the market. We conclude by noting that our model is in agreement with 
empirical observation where we observe p(k) to be pure exponential or a power law with an 
exponential cutoff. Our analysis also sheds light on the emergence of the exponent r m 2 
observed in certain biological, social and economic systems. 
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Level 


A 


B 


c 


D 


E 


total number of 


13 


84 


259 


432 


3913 


classes in each levels 












number of classes 








8 


20 


458 


born in each level 












number of classes 














252 


died in each level 













TABLE I: Two different initial conditions for classes in PHID levels: (i) For levels A and B we 
have no birth or death of classes. System grows with the birth or death of units to pre-existing N 
classes (13 for level A and 84 for level B). (ii) For levels C and D system grows not only with the 
birth or death of classes but also with birth and death of units inside classes. 



Level 


A 


B 


C 


D 


E 


correlation between number 
of units born and existing 
number of units in classes 


0.93 


0.87 


0.84 


0.82 


0.70 


correlation between number 
of units died and existing 
number of units in classes 


0.88 


0.86 


0.80 


0.78 


0.75 



TABLE II: Correlation of birth and death of units with existing number of units in classes for each 
level in PHID. This observed correlation justifies the preferential birth or death of units which is 
rule 2 b and 2 c of our model. 
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Simulation 
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FIG. 1: Simulation results of the model, (a) Symbols are data points from simulation, solid lines 
are regression fits. We observe for 6 = (i.e. no class creation ) cumulative probability distribution 
is a pure exponential while for N = (i.e. we start with zero initial class ) a pure power law A;"^ -1 ) 
with exponent r = 2. (b) We observe that as we change the ratio of number of pre-existing classes 
to the new born classes p(k, t) changes from a pure power law to a pure exponential. 
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PHID ( hierarchal classification ) 
Total 48819 products 




Level A : 13 classes 

Level B: 84 classes 
Level C: 259 classes 
Level D: 432 classes 



Level E: 3913 classes 



FIG. 2: In the pharmaceutical industry, products can be classified according to five levels. When 
a particular product arrives in the market, it is labeled under any one of the 13 classes of the level 
A, 84 classes of level B, and so on. Since the 19th century, the number of classes of level A or B has 
remained constant even though the number of products within each class had a dramatic growth. 
Over the period of our empirical analysis the number of classes in levels C and D increased by 3% 
and 5% respectively. Products can also be grouped into firms which markets them (classification 
level E). In the figure we give the number of classes in each level in 1991. 
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Empirical Results, PHID 
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FIG. 3: Figures (a)~(d) corresponds to levels (A)~(D) respectively. Products in the pharmaceuti- 
cal industry are classified into levels A, B, C and D. Levels A and B have fixed numbers of classes, 
the number of classes in levels C and D increases by 3% and 5% respectively over the period of 
our analysis. For instance, for level A (fig. 3 a) which contains only 13 classes, the distribution is 
estimated from 13 random interger numbers which corresponds to classifying 48,819 products in 
13 classes. Symbols represent data points in each level (a)~(d) while solid lines are predictions of 
the model. Cumulative probability distributions for all levels are pure exponentials as predicted 
by the model. 
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Empirical Results, PHID 




Class Size, k 
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FIG. 4: Empirical results from PHID level E. The classes analyzed here are the firms. Circles are 
data points, solid lines are regression fits, (a) Log-log plot of cumulative probability distribution 
of the class sizes show a power law decay k~^ T ~~ 1 ^ with r ~ 2 for k < 200. (b) Log-linear plot of 
cumulative probability distribution show the exponential decay for k > 200. 
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